IIA1.PUB[NSF,MUS]1 - www.SailDart.org

perm filename IIA1.PUB[NSF,MUS]1 blob sn#096539 filedate 1974-04-10 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00014 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.SELECT B
C00004 00003	.SELECT B
C00013 00004	.BEGIN FILL ADJUST
C00019 00005	.SELECT A
C00028 00006	.BEGIN FILL ADJUST
C00033 00007	.GROUP SKIP 2
C00038 00008	.SELECT 5
C00042 00009	.SELECT 5
C00048 00010	.GROUP SKIP 2
C00054 00011	.NEXT PAGE
C00058 00012	.SELECT 5
C00065 00013	.SELECT 5
C00067 00014	.SELECT 5
C00071 ENDMK
C⊗;
.SELECT B
.BEGIN CENTER
II. RESEARCH PROPOSAL
.END
.GROUP SKIP 2
.SELECT 5
note to the reader
.SELECT 1
.BEGIN FILL ADJUST
In the  following section we  present the nature  of our  current and
proposed  research.   Because  of  the scope  of  our  research, this
presentation is necessarily lengthy.  To aid the reader  in obtaining
an  overview  of  this  material,   we  have  adopted  the  following
convention  in  this section.  Introductory  and summary  materials,
including references to pertinent Figures and Recorded Examples, are
presented at the head of all  divisions  of  this section.    More  %5detailed
presentations%1 follow  immediately as  subsections which have
%5headings in lower-case  italicized type (as here  printed)%1.   The
reader who  first  desires to  get an  overview of  our work,  before
becoming involved  with the %5details%1, can use this format as a guide.
.END
.GROUP SKIP 2
.SELECT B
.BEGIN CENTER
A. SIMULATION OF MUSIC INSTRUMENT TONES
.END
.SELECT 1

.BEGIN FILL ADJUST

In this part  of the proposal we  will discuss our approaches  to the
computer simulation of  music instrument tones.  The main goal of our
research is the development of a powerful, general  purpose technique
for the simulation of auditory  signals that will have the perceptual
complexity  and naturalness of the musical  sounds which occur in the
real world.   The fundamental concern here  is with the synthesis  of
natural  timbres of  the extremely  varied and  highly  complex tones
which occur in music.  The definition of timbre most accepted in  the
literature  of  auditory  theory  is  that  stated  by  the  American
Standards  Association (1960): "Timbre is  that attribute of auditory
sensation in  terms of which  a listener  can judge  that two  sounds
similarly  presented  and having  the  same  loudness and  pitch  are
dissimilar."  It is  added that:  "Timbre depends primarily  upon the
spectrum of the stimulus, but it also depends upon  the waveform, the
sound  pressure, the  frequency  location of  the  spectrum, and  the
temporal characteristics of the stimulus."

The immediate problem we face in designing a successful algorithm for
the computer simulation  of music instrument tones involves the exact
nature of the psychophysical relationships  in timbre, that is,   the
relationships  between  the physical  properties  of  sounds and  the
subjective,  psychological qualities  by which  they are perceptually
differentiated when  presented at the  same loudness  and pitch.   We
have found  that there  has been relatively  little work done  in the
last century of  auditory research  investigating the  psychophysical
relationships in timbre perception. Even of that which has been done,
most  of the  researchers have held  such restrictive  definitions of
timbre that their  findings are entirely  useless for application  in
our present attempt  to design simulation algorithms.  This should be
evident  if  we  examine  the  real  lack  of  clarity  shown in  the
definition  given  in  the  last  paragraph,  where  timbre  is  most
precisely pinpointed by a negative formulation: whatever is not pitch
and not loudness (and, we might add,  not duration and not  location)
is  `timbre'.    Of  the  extremely  complex  acoustical  left-overs,
researchers  have mainly  focussed on the  (so-called) `steady-state'
spectrum of  stimuli as  the dominant,  if not  exclusive, factor  in
timbre. 

It is becoming clear that this is only one  of many factors in timbre
-  if indeed there  even is an  actual `steady-state'  in real tones,
that is,   a duration  of stability  in which the  amplitudes of  the
harmonic components of  a tone remain constant.  From  the results of
recent  analyses of real  tones,   as well as  attempts to synthesize
natural-sounding tones, it appears  that real tones have  complex and
ever-changing  physical properties,   and  that the  nature  of these
temporal changes is  most probably an  extremely important factor  in
timbre (Luce,  1963; Risset,  1966; Strong & Clark, 1967a, 1967b;
Freedman,  1967,  1968).  The vagueness of the above-cited definition
seems to accurately represent the  actual state of knowledge to  date
concerning  timbre perception,  and to  ask  for a  more precise  and
useful  statement demands  fundamental  research which  is yet  to be
done.  We are now in the position to accomplish such research,  given
the possibilities presented by the digital computer.

Supporting  our  goal  to  develop  a   technique  for  the  computer
simulation  of  natural tones,  therefore, is  a  concurrent research
effort to  formulate a  model which  is able  to  describe the
perception of musical sounds. The theoretical problem is to establish
a  set of acoustical dimensions, those  which are actually salient in
the perception  of musical  timbre,   and to  design a  computational
algorithm that enables the user to exert the most direct control with
respect to  these dimensions  for the  purposes of  simulation.   The
empirical problem which follows is the determination of those aspects
of  the signal  which are  actually important  in the  perception and
identification of a  sound. Necessarily  included is a  study of  the
distinctive  features  of signals,    the  investigation of  physical
conditions  which contribute to the naturalness  of a signal. Related
research  should  examine  the  general  characteristics   of  timbre
perception,    looking into  the  effects of  such  phenomena as  the
categorical identification of musical sounds.
.NEXT PAGE
The discussion which follows  is concerned with a description  of our
systematic approach to a general model for simulation, which is based
on perceptual verification at every  step.  The two strategies  which
we have used for simulation are  additive synthesis which is based on
the analysis of real tones,  and frequency-modulation synthesis.  The
first method discussed  below, additive synthesis based  on analysis,
presents the goal for research  of data reduction.  We start with the
most complete,  complex information about a real signal given through
its analysis.   We then systematically  step in the direction  of the
most  simple   representation  of  the  signal   which  can  be  used
successfully to reproduce  the original  tone by additive  synthesis.
To this end we examine the perceptually important aspects of physical
signals. 
.END
.BEGIN FILL ADJUST
The second  approach towards a  model for simulation  discussed below
begins  with a  simpler,   more easily-controlled  process, frequency
modulation synthesis of  sound.   This technique allows  the user  to
directly manipulate aspects of the  signal that we subsequently found
to be very meaningful in terms  of certain perceptual  cues for music
instrument tones.    The  success of  this  method  first came  as  a
surprise  because   the  physical  waveform  that   it  generates  is
strikingly different from that of any natural signal.  However,  upon
inspection, the reasons for  this success have been determined,   and
we thereby  began to learn what  physical dimensions are perceptually
important for timbre.  The direction of this approach is  to increase
the complexity of the synthesis process,  until there is control over
a very wide set of features which occur in natural tones.

Following the detailed  description of these  two methods is  a third
section devoted to  a discussion of the ultimate aim of our research:
the development  of  powerful,  general-purpose  algorithms  for  the
computer simulation  of natural tones.   This more  general algorithm
will  be an outgrowth  of the interdependency and  convergence of our
two  approaches to  synthesis.  The  two approaches  do  not  proceed
independently  of  one  another,  but  interact  at  several  levels.
Findings in one technique  can immediately be  applied to the  other,
and a system  for cross-verification is thereby established.  In this
way,  a  convergence  of  these  two  methods  is  approached.    The
simplicity and  perceptual meaningfulness  of specifications  to  the
frequency modulation technique  points out an important  goal for the
additive  synthesis method.  On the other  hand,  the complexities of
tone which are revealed by  analysis,  and which are confirmed  to be
perceptually salient in the  additive synthesis,  point out necessary
levels of  complexity  which must  be  accomodated by  the  frequency
modulation  technique. As  the  latter technique  is  then made  more
complex, it  in fact enters the category  of additive synthesis.  The
ultimate model for  simulation will draw  from the research  findings
using both methods.  
.NEXT PAGE
A common  aspect of  research with  both methods  is the concern  for
perceptual   verification  of   any   particular  results   at  hand.
Experimental methods from perceptual psychology are employed  for the
rigorous verification of the success of  simulation,  in terms of the
discriminability of a synthesized from real tones and in terms of the
naturalness of simulation.  In addition, to assist in the development
of a general algorithm, we will have to formulate a general model for
the  perception  of  timbre.     This  general  model  will   provide
information for  the construction of  perceptually-based higher-order
simulation algorithms.   We employ a spatial model for the subjective
structure of the perceptual relationships between signals.   Research
is directed at uncovering the dimensionality of the subjective space,
the psychophysical relationships which are structurally correlated to
this space,  and the properties of the space.   The existence of such
constraints  as categorical  boundaries  will be  investigated  in an
attempt to assess the continuity of the subjective space for  timbre.
In the  same regard,   we will  also examine  the effects of  musical
training or context on the structure of the space.  The model will be
evaluated by our ability  to predict the  mappings of real and  novel
tones.
.END
.GROUP SKIP 2
.SELECT A
1. ADDITIVE SYNTHESIS BASED ON THE ANALYSIS OF REAL TONES
.GROUP SKIP 1
.SELECT C
INTRODUCTION TO SYNTHESIS AND ANALYSIS TECHNIQUES
.SELECT 1
.BEGIN FILL ADJUST
This section will introduce  our approach to the simulation  of music
instrument  tones using additive  synthesis based on  the analysis of
tones from actual instruments. Additive synthesis considers a complex
sound to be the sum of a set of sinusoidal components,  or harmonics.
A  basic  presentation  of  synthesis  and  analysis  techniques will
follow.  These  are based on computer  processes that analyze a  real
tone,  which  has  been  recorded  and digitized,  into  time-varying
frequency and  amplitude functions  for  each of  its harmonics.    A
concrete example  of this  is given in  Figure 1  for the first  four
harmonics  of a violin  tone. Given the  results of  analysis, we can
then reproduce  the tone  by  additive synthesis,  where the  set  of
sinusoidal components  are controlled  in amplitude and  frequency by
the  analyzed  functions, and  their  outputs are  added  together to
constitute the complex  music instrument tone. Various  other methods
for displaying sets of amplitude and frequency functions are given in
Figures 2 through 4.
.NEXT PAGE
%5synthesis%1

In additive synthesis we physically model a complex sound waveform as
a  sum of sinusoids  with slowly time-varying  amplitudes and phases.
The process of synthesis involves specifying the amplitude  and phase
(equivalently, amplitude  and frequency) for each  component sinusoid
as it  varies with time throughout the duration of the tone.  We will
generally  refer  to  this  specification  as  being  a  time-varying
function, amplitude  or frequency, for  a component sinusoid.
These  sinusoids are  added  together to  produce  the
complex waveform. Equation (1) summarizes this formulation.
.END
.GROUP 
.SELECT 3


          M
(1)  F%8α%3 = %6S%3 A%8n%3 sin(%4w%8n%3αh+%4q%8n%3)
         n=1

Notation:    α is the sample number
  	     h is the time between consecutive samples
	     F%8α%3 is the sampled, digitized waveform at time αh
	     A%8n%3 is the amplitude of the nth partial tone
	         and is assumed to be slowly varying with time
	     %4q%8n%3 is the phase of the nth partial tone
	         and is assumed to be slowly varying with time
	     %4w%8n%3 is the radian frequency of the nth partial tone
		

.SELECT 1
.APART
.BEGIN FILL ADJUST
One can see that  from this model, if we can  determine the functions
A%8n%1 and %4q%8n%1  of a tone from a musical instrument, we can then
synthesize  an  approximation  to  the  waveform  F%8α%1  from  those
functions by  use of equation (1).  The degree to which  this form of
synthesis has  been successful will be discussed below.  To determine
the functions A%8n%1 and %4q%8n%1 of a music instrument tone, we must
assume  the frequencies of  the partial  tones, %4w%8n%1,  are nearly
harmonically related. By harmonically related, we mean that the  tone
has a fundamental frequency,  %4w%1, and that the frequencies  of all
the  partials of the  tone are  integer multiples of  the fundamental
frequency. That is, the  frequency of the n%2th%1 partial,  %4w%8n%1,
is approximately n%4w%1.

It should be pointed out that equation (1) could have been formulated
with  time-varying frequencies and constant  phases. This formulation
is equivalent and  for all practical  cases, the  one can be  derived
from the other.  We will  speak interchangeably of the `phases' of the
harmonics  and the  `frequencies' of the  harmonics as  a function of
time. In the context of the analysis of tones, it  is most natural to
produce the phases of the harmonics as functions of time, as is shown
in Appendix  A.    For  intuitive purposes,    however,  it  is  more
instructive to view displays  of the frequencies of the  harmonics as
functions  of  time,  and we  will  therefore  usually  refer to  the
frequency (rather than phase) functions of harmonics. In Figure  1 we
present an example  of a set of time-varying  amplitude and frequency
functions for four component sinusoids of a tone.  We would use these
functions  in  additive  synthesis  to  control  the  amplitudes  and
frequencies of  four components of  a complex tone.   In fact,   they
would constitute the first four harmonics of the tone,  their average
frequencies being approximately 308 Hz, 616 Hz,  924 Hz, and 1232 Hz,
respectively.  We  should note that these functions were actually the
result of  the computer-analysis  of a  real tone,    which was  tape
recorded and then digitized.  
.GROUP SKIP 2
%5analysis for additive synthesis and graphic techniques%1

The  method we  have  found  most useful  for  analysis we  call  the
`heterodyne filter.' This is described in detail in Moorer (1973) and
is derived briefly in Appendix A. We take the digitized waveform of a
single music instrument tone and for each harmonic under analysis, we
perform  the  following  operations:  We  form  the  products of  the
digitized waveform with  a sine and cosine  at the frequency of  that
harmonic and compute  the average of each product  over one period of
the fundamental frequency of the tone. The square root of the sum  of
the  squares  of  these two  averages  is  an  approximation  to  the
amplitude  of the  harmonic in  question at that  point in  time. The
inverse  tangent  of   the  ratio  of  these   two  averages  is   an
approximation to the phase of the  harmonic in question at that point
in  time. We repeat this process throughout  the duration of the note
for all harmonics.
.END
.BEGIN FILL ADJUST
As aids to the researcher, we have designed several different methods
for displaying the  results of analysis. The output of the heterodyne
filter can, of course, be displayed as a number of isolated amplitude
and frequency functions,   covering the individual  components, as is
shown in  Figure 1 for the first four harmonics of a violin tone. The
total  duration  of the  tone  is  about  400  milliseconds  and  its
fundamental  frequency  is about  308  Hz.    Sixteen harmonics  were
actually analyzed for this tone,  but we  present the  isolated plots  
for only  the first four of these in Figure 1.  Three more pages of
such plots would cover the remaining harmonics,  however, the first
page is sufficient to get a feeling for the sort of information which
can be obtained from this form of display.
.NEXT PAGE
To obtain a more easily-grasped picture  of the relationships between
all harmonics  of a tone, it has been  found most informative to view
the entire set of harmonics  together.  One method designed for  this
is the three-dimensional perspective plot. Figure 2 shows such a plot
of  the amplitudes of all  sixteen partials of  the same violin tone.
The fundamental  appears as  the  backmost function  in the  picture,
while the highest harmonic  is represented as the frontmost function.
This form of display allows us to more readily discover relationships
among the harmonics.   The perspective plot can be  spatially rotated
on-line  by the computer,   so that the  observer is able  to see the
three-dimensional representation from any angle.  This has  been very
helpful in getting a more comprehensive understanding of the behavior
of the partials of a tone as a function of time.

Another form of display revealing the evolution  of the partials of a
tone  as a  function of  time is  the sequential  line-spectrum plot.
Here, we  make  use of  animation  techniques to  present  successive
moments in the  tone, presenting a plot of the  amplitudes of all the
harmonics  at each moment  in time.   One plot is shown  in Figure 3,
taken from  the  middle of  the violin  tone.  This strictly  on-line
display presents such two-dimensional frequency by amplitude plots of
the partials  for successive  instants in  time, and  the viewer  can
follow the amplitude changes  for the partials from the  beginning to
the end of the tone.

A  fourth way  of  examining the  output  of the  heterodyne  filter,
inspired by the conventional speech spectrograph,  is given in Figure
4.   The particular  advantage of  this form  of display  is that  it
presents  both  frequency and  amplitude  information  at once  in  a
concise  plot, allowing us  to view relationships between  the two as
functions of time. Here, the thickness of each bar is proportional to
the  log of  the amplitude  of that  harmonic. The  vertical position
represents its instantaneous frequency,  as determined from the phase
drift  of  the  harmonic.    The  utility  of  this  display  is  its
representation of the phase information with respect to amplitudes.
.END
.GROUP SKIP 2
.SELECT C
CURRENT RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
We begin the discussion  of our current research with  the results of
the  perceptual  evaluation  of  the  analysis-synthesis strategy,  a
necessary test  of  the usefulness  of  this approach  in  simulating
natural tones.   Experienced listeners verified that  the strategy of
additive  synthesis based  on the  results of heterodyne  analysis is
indeed   capable   of   producing   tones   that   are   perceptually
indistinguishable  from   their  respective  original   recorded  and
digitized  tones.    The  main  function  of  the  analysis-synthesis
technique is to provide a starting point for simulating signals which
retain  the perceptually salient  features of  complex natural tones.
We begin with the most complete  set of data on the signal which  can
be provided by analysis. We are assured that additive synthesis based
on  this highly detailed  information will produce a  signal which is
indistinguishable from the original (compare Examples 1 & 2, the digitized
and synthesized versions of the violin tone given in Figures 1 through 4).

Our goal is to determine which aspects of  the simulated signal,  and
therefore  the  original signal,   are  perceptually  important. Some
attempts have  been made in  the past  to reduce  the highly  complex
information derived  from the  analysis of real  tones to  just those
perceptually important  aspects of the signal which are necessary for
its simulation (Risset,  1966;  Strong & Clark,  1967a, 1967b).   Our
current research involves an extension  of this type of work. We will
first discuss the  systematic modifications  of tones  which we  have
performed in our  work, namely the  filtering of signals in  order to
localize  their most perceptually important  components.  The results
of synthesizing tones from  selected components rather than the  full
set of analyzed harmonics, a highly accurate form of `filtering', was
found  to  provide  a  means to  study  the  important  cues  for the
identification of instruments.  

We will then describe  a main direction in our  current research: the
simplification of  the very complex data  structure which is obtained
from the analysis of  the physical properties of  a real tone.   This
process,  generally referred to as data reduction, serves many of the
goals  which we have set  for an ideal  simulation algorithm. We will
describe the  remarkable success  which we  have had  in using  small
numbers of line segments  to represent the time-varying amplitude and
frequency functions  for  the components  of various  tones. We  will
mainly  cite the  results of  studies on the  violin as  one concrete
example, and  briefly mention  related findings  for  other types  of
instruments. A  strikingly successful  reduction of the  complex data
obtained  from the analysis of a violin tone  (shown in Figures 1 & 2
and presented as Recorded Example 1) was  a representation of each  amplitude
and frequency function by  only three line segments (shown in Figures
5 & 6 and Recorded Example 1).
.END
.GROUP SKIP 2
.SELECT 5
perceptual evaluation of analysis-synthesis strategy
.SELECT 1
.BEGIN FILL ADJUST
It is  necessary  to confirm  the utility  of our  analysis-synthesis
strategy  for the  simulation of  musical tones on  the basis  of the
perceptual success of the synthesized signal.  That  is to
say,   we must  establish that  the signal which  is produced  by our
analysis-synthesis technique is indiscriminable from the digitized recording of
the original sound.   The critical test,   then,  is a  comparison of
the  sound which  has been  produced by  additive synthesis  with the
original musical tone that was  analyzed with the heterodyne  filter.
Informal experimentation, in which experienced listeners compared the
original  recorded tones,  played  directly back  after digitization,
with the tones that were synthesized  on the basis of analysis,   has
shown  that  the  analysis-synthesis  method  produces  an  extremely
convincing  replication of the original  signal. The strategy thereby
has been perceptually verified as being capable  of reproducing natural
tones. This is in  agreement with the findings of other investigators
who have attempted to verify similar analysis techniques by comparing
tones synthesized  from analysis with  the original  signals (Risset,
1966; Freedman, 1967, 1968).

Among  the  types   of  natural  musical  signals  which   have  been
successfully  simulated by the analysis-synthesis  procedure are tones
from the string,   woodwind,   and brass  families of the  orchestra.
Specifically, we have been able to reproduce tones of various pitches
and durations from the following instruments:
.SELECT C
.NARROW 10,10
violin, viola, cello, double bass, trumpet, trombone,
French horn, baritone horn, oboe, English horn, bassoon,
Bb clarinet, alto clarinet, bass clarinet,   
flute, alto flute, alto sax,  soprano sax.  
.WIDEN
.SELECT 1
The  comparisons  between  the original  digitized  tones  and  their
respective  simulations  by  experienced  listeners,   musicians  and
acousticians,    have  demonstrated   the  potential  power  of   our
analysis-synthesis technique.   The digitized and  synthesized violin
tones  (the analysis of  which is shown  in Figures 1  through 4) are
presented in Recorded Example 1.
.END
.GROUP SKIP 2
.SELECT 5
filtering of signals to localize perceptual cues
.SELECT 1
.BEGIN FILL ADJUST
One sort of modification which we have  employed to reveal perceptual
cues for the  identification of tones is that of filtering.  Directly
given by  our analysis-synthesis  method is  the  power to  precisely
select the harmonics which will be synthesized.  We have applied this
modification to a number of signals, to make a preliminary evaluation
of its usefulness for localizing, in the  frequency plane, perceptual
cues for the identification of a number of music instrument tones. As
we had expected,   there is a fairly  broad variation in the  minimal
number  of  lower  harmonics which  are  necessary  to  transmit  the
identity of an  instrument.  This variation occurred even though many
of the tones  started with  the same  number of harmonics.   A  close
study of  the variation of  identification with the  low-pass cut-off
frequencies  for the  individual tones gave  a rough  estimate of the
location  of various  perceptual  cues  for  these instruments.    We
concluded  from these preliminary tests  that the selective filtering
of signals could  indeed provide  much significant information  about
the relationships between the physical properties of sounds and their
perceived qualities.  We propose further work below, but give here an
example of the results of this testing on the violin.

The violin tone examined is the one displayed in Figures 1 through 4.
A number of filtering operations  were performed on this signal,  and
we had experienced listeners, including several musicians, attempt to
identify which instrument had produced the  tones.  Identification of
this particular  source was increasingly difficult for most listeners
in the low-pass filtering condition as the cut-off was  reduced below
the tenth harmonic.   At that point the source of  the signal was not
definitely   identified  as  a  violin,     but  any   of  a  set  of
string-instruments. This  contrasted  to  the results  obtained  with
various other instrument sources -  some of these had an equal number
of analyzable components  to start  with - which  could be  correctly
identified with  much  lower cut-offs.   For  example, the  clarinet,
which also  started with 16 harmonics, could  be identified from only
harmonics 1,  3, and  5.   With the  violin,  however, the  prominent
activity in seventh through eleventh harmonics, especially during the
attack, may be implicated to be of great perceptual importance.  In a
high-pass filtering condition, it was found the identification of the
violin source was difficult  when only the first three harmonics were
absent.   This  was found  with  most  of the  other  signals  tested
(however,    prominent  cues  for   certain  brass  instruments  were
associated with  patterns of modulation during their attack segments,
and these sources could often  be identified from a single  component
which  displayed the  modulation  pattern).   More complex  selective
filtering  strategies again confirmed the  importance of the activity
of the seventh through  eleventh harmonics for the  identification of
the violin tone. 
.END
.GROUP SKIP 2
.SELECT 5
data reduction
.SELECT 1
.BEGIN FILL ADJUST
A general strategy for data reduction is presently being persued. The
complete  data is initially  represented by 400  to 500 line-segments
per amplitude and frequency function per harmonic. (This,  of course,
is a  reduction of data  from the  25,000 points per  second directly
produced  by the  analysis,  but  does not  significantly distort the
complexity of  microstructure within  the functions.) We  are in  the
process  of  determining  the  minimum  number of  line-segments  per
function which will allow for a successful simulation of the original
signal.   Data  reduction will  be found  to depend  on an  empirical
verification of the perceptual fitness of measures taken: the success
or failure of  a current data reduction  strategy contributes to  the
understanding of  the salient features  in the perception  of musical
tones;  this understanding,  in turn, directs  the next stage of data
reduction. 

Reduction  efforts have  met  with a  surprising  degree of  success.
Simulations based on  as few as three line-segments per amplitude and
frequency function  have  been indistinguishable  from  the  original
signals  by  experienced  listeners.    The  shapes  of  the  reduced
functions  are now  empirically derived  from the  originally complex
curves. An example of a successful three  line-segment data reduction
of the  amplitude and frequency functions for  the violin tone shown
earlier in Figures 1 & 2 (Recorded  Example 1) is shown  in
Figures 5 & 6 (also Recorded Example 1). This represents an enormous
step  in  data reduction,  where,    for example,    the  violin tone
referred to could be represented by a total of less than  200 numbers
rather  than over  16,000  (assuming approximately  500 segments  per
function).   The  resources of the  computer are  optimally used with
respect to the  storage of parameters for  synthesis,  the number  of
input-output operations required,  and the size and complexity needed
in the program used for synthesis.  

The success of these reductions also present  a major discovery about
the  perception of timbre, since  the subtle micro-fluctuations which
occur in the physical parameters  of these signals seem to have  very
little perceptual importance, even in laboratory listening conditions
where  tones are presented  in temporal isolation  for comparison. At
present,  we   are   attempting  to   represent   functions  by   two
line-segments,   and  have had  encouraging results  with  the violin
tone,  the only one tested  so far. This instrument, we should  note,
has been  one  of the  most challenging  sources  for data  reduction
attempts,  and  presents a  good  test of  any  particular technique.
Previously,   an effort was  made to  employ constants,   instead  of
time-varying functions,  for the frequencies of the components.  This
was found to produce a noticeable change in the quality of the violin
tone, although several of the other signals tested suffered much less
discriminable alterations.  The  change was described by listeners as
a decrease in the strength of the attack of the signal.  The tone was
still  considered  to   retain  the  quality  of  naturalness,     as
established  from informal  reports,   including  the response  of an
experienced  violin  player.   Our  general   conclusion  from   this
preliminary  study  was  that time-varying  frequency  functions  are
necessary  to  exactly  replicate  certain  second-order  features of
tones.   This  does  not  preclude  the substitution  of  some  other
physical manipulation for these features.
.END
.NEXT PAGE
.SELECT C
PROPOSED RESEARCH 
.SELECT 1
.BEGIN FILL ADJUST
We will  now  turn to  our proposed  research, and  begin by  briefly
discussing the necessary extension of the range of timbres covered by
the additive synthesis technique.  The eventual development  of truly
general  techniques  are  contingent  upon this  extension  of  cases
examined.  We next describe our plans for a systematic exploration of
data reduction  techniques,   which include the  rigorous testing  of
particular  methods  by  perceptual  scaling  experiments.   We  then
discuss a practical result from this exploration: the development  of
automatic data reduction algorithms. Reduced  data structures for the
physical attributes of music instrument tones provides the researcher
with a better tool to investigate the more general aspects  of timbre
perception for whole sets of  natural sources. This will be amplified
in a  latter section, devoted to the applications of multidimensional
scaling techniques to  timbre perception.   We will here discuss  the
higher-order  algorithms which should  result for  additive synthesis
from the above research, algorithms which give the user  perceptually
meaningful controls and which make optimal  use of computer resources
for the simulation of tones.
.END
.GROUP SKIP 2
.SELECT 5
extension of timbral range
.SELECT 1
.BEGIN FILL ADJUST
A  necessary step  for  the  eventual  development of  truly  general
simulation  techniques  is  the  application  of  our methods  to  an
extended set  of sources.   For  this purpose,   we  are planning  to
gather a  large collection of  tones from string, woodwind  and brass
families of  musical instruments.  Notes at several durations, played
in different manners, will  be recorded throughout the ranges  of all
instruments  in the above  families.   As our research  progresses in
time,  we  will cover a  broader base  of signals.   We will  thereby
investigate the  perception of  a very  diverse set  of cases and  be
guided to a more  general system for simulation.  In that the goal of
our endeavors is to develop a technique by which we can realistically
simulate any instrumental sound,  having any specific characteristics
in  any  context  that  could  occur  in  reality,    our  data  base
necessarily will be extensive.  The widening of this  data base is an
important part of our future research, and an integral feature of all
phases of investigation that are presented below.
.END
.NEXT PAGE
.SELECT 5
systematic exploration of data reduction techniques
.SELECT 1
.BEGIN FILL ADJUST
A series of rigorous discrimination studies are  planned,  in which a
wide   range  of  signals   and  reduction   specifications  will  be
investigated.    The  basic   approach  consists  of  observing   the
perceptual effects of systematic modifications and simplifications of
the  data  which  directs  synthesis.    Listeners  will  attempt  to
discriminate the  original digitized tones,   tones synthesized  from
their  complete analyses,   and tones  which have  been significantly
simplified in their  parametric data.   The results  of this  testing
will give us the  strongest evidence for those aspects  of the signal
which are  important to perception and  those which are insignificant
and  need  not  be  present  in  a  simulation,  with  respect  to  a
representative  population  of  listeners  with  varying  degrees  of
musical training.

The discriminability of signals is a standard perceptual measurement.
The   experimental  procedure   employed   for  the   measurement  of
discriminability will involve the judgment of `same'  or `different'
for a  pair of tones by  the listener.  The  experiment is completely
controlled by computer: pairs of tones are randomly selected from the
stimulus set and  played to the  listener; his response is  tabulated
and the data is analyzed.  It should be noted that the computation to
synthesize the tones  is done  beforehand, and  the digital  waveform
representing  the  tone  is  stored  on  the  bulk-storage  disk.  On
completion of  the computation, the tones are  played by the computer
through the digital-to-analog  converter. The  only analog  equipment
used is the standard audio system.

In that  we are concerned  with the simulation  of signals  which are
highly realistic to the listener,  we will be interested in measuring
the `naturalness' of simulations  for listeners, who have varying  degrees
of musical training.   We will test tones both  in temporal isolation
and  in complex sequences,  to determine  the relative effects on the
evaluation of naturalness for  tones induced by the context  in which
they are presented. We realize  that this measurement could be
subject to much  variability, and  we feel  that it  is important  to
carefully  examine  factors  which  might  be  correlated  with  this
variability, such  as the background of the  listener and the context
of the signals.   It will  be important to  evaluate the adequacy  of
simulations with respect  to these factors, if the  techniques that we
develop  are to obtain  generality.  Experiments  will have listeners
apply  an  N-point  rating  scale  of  relative  `naturalness'  to  a
particular set of tones,  which will include the digitized real tones
and  discriminable  simulations,  some  which  are  the  results   of
drastically  simplified  methods, such  as  fixed-waveform  synthesis
where spectral dynamics are absent.
.NEXT PAGE
Our preliminary  findings suggest a  general success  with as few  as
three  line-segments  per  control function.    Even  if this  vastly
simplified representation  of natural  signals turns  out  to be  the
limiting  case,   rapid progress  can  be made  in understanding  the
psychophysical  relationships in their perception and identification.
The investigation  of these  relationships,  between the  subjective,
perceived qualities  of tones and their physical  properties, will be
vastly facilitated by the simplified representation of their physical
properties. The  importance of  relative slopes  of attack and  onset
times  of  components,    the  ranges  of variation  permissible  for
spectral levels,  and the necessity to exactly preserve various other
overall characteristics  of the analyzed functions  for each harmonic
will be closely studied.
.END
.GROUP SKIP 2
.SELECT 5
automatic data reduction algorithms
.SELECT 1
.BEGIN FILL ADJUST
As we examine a broader  range of signals, we will be  able to design
algorithms  for the  automatic reduction  of  the data  from analysis
which will replace  our initial empirical method  of reduction.   The
first  step  will  be  the automation  of  the  process  of
line-segment  fitting of complex time-variant amplitude and frequency
functions.  The optimal type of fitting procedure,   and the range of
permissible variability in  the reduction, will have been established
in  the research  outlined above  for  a variety  of  signals.   More
sophisticated  routines  for data  reduction will  draw  from related
research on the perception of timbre which is described below.
.END
.GROUP SKIP 2
.SELECT 5
higher-order algorithms
.SELECT 1
.BEGIN FILL ADJUST
The systematic  exploration of  the relationships  between the  known
physical  properties of  tones and  their perceptual  correlates will
reveal the salient cues for their identification, hence the necessary
features  for  their  simulation.    Perceptual  scaling  experiments
described below are designed to uncover the dimensions and properties
of the subjective space for timbre. With this information  we will be
able to begin to approach a general model for the auditory processing
of complex  natural signals.  We will  also benefit  by developing  a
successful set  of  strategies for  data reduction  for the  computer
simulation  of  these signals.  At  this  point we  will  be  able to
investigate  the  more  central   aspects  of  auditory   information
processing,  the internal  representations of complex natural stimuli
and  the perception  of these  stimuli in complex  temporal contexts.
This  information will  lead  to  very powerful  computer  simulation
techniques  able  to  produce  realistic  sounds  in  highly  complex
realistic contexts.

Higher-order simulation  techniques will be  a direct product  of the
research  with additive synthesis  in conjunction  with findings from
the frequency  modulation approach  described next.   A  higher-order
simulation  algorithm will  simultaneously  provide the  user with  a
powerful level of control over salient aspects of tone while reducing
and  making   more  experientially   relevant  the   type  of   input
specifications to the simulation  procedure. As we
come to understand the perceptually  important features of tone,  the
simulation algorithm will reflect this  understanding.  Features like
the relative  attack slopes and onset  times of components, expressed
in  simplified  graphical-relational  form,    such  as  the  overall
evolution  of the  bandwidth  of energy  distribution  of the  signal
through time,  will come to be directly dealt with by the user. Other
possible  important  features  of  tone,  e.g.    the  modulation  of
functions in amplitude or frequency or the existence of bandwidths of
noise,  will be controlled via meaningfully simple specifications  by
the user.
.END
.GROUP SKIP 2